Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rfc: add platform support RFC #1065

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

msanft
Copy link
Contributor

@msanft msanft commented Dec 12, 2024

This adds an RFC for platform support and representation in Contrast. This is in very much of a bare-bones stage now, so reviews and critique are welcome.

@msanft msanft added the no changelog PRs not listed in the release notes label Dec 12, 2024
This adds an RFC for platform support and representation in Contrast. This is in very much of a bare-bones stage now, so reviews and critique are welcome.
@msanft msanft force-pushed the msanft/rfc/platform-support branch from 7459452 to 1650f8f Compare December 12, 2024 09:32
Comment on lines +75 to +76
This would also enable easy validation of user error, as the individual types can be checked for
validity at parsing time, making invalid states unrepresentable [^1].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that it changes much about the proposal at hand, but "make invalid states unrepresentable" is not achieved if I can do

myFancyPlatform := Platform{
    KubernetesDistribution: RKE2,
    Hypervisor: CloudHypervisor,
    HardwarePlatform: Unknown,
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under the assumption of only validating constructors being used in the program, of course. I do acknowledge that my example is counter-intuitive for that, but it's a simplification, after all.

So `AKS-CLH-SNP` could become something like this:

```go
type Platform struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional fields that come to mind: Snapshotter, ImageVariant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on ImageVariant?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with GPU support or without, debug or prod, ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think those should be additional fields, ultimately. That's the key point I want to convey in this proposal. It seems I need to make that more clear?


## Considerations

### Node-Installer Images
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do Platforms relate to runtimes? A runtime could be used with more than one image (not easily, but possible), but not with more than one hypervisor or snapshotter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Runtimes in the sense of Kata runtimes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kubernetes runtimes.

Comment on lines +104 to +105
for all platforms. This could also be a foundation for heterogeneous deployments that utilize multiple
platforms (for example GPU- and Non-GPU machines).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The node-installer could figure out a lot of configuration without user interaction - TEE flavour, containerd paths, maybe even hypervisor. On the other hand, outside the runtime the config is only used to decide which genpolicy to use and to populate reference values. I wonder whether we actually need to expose the entire runtime config to the user.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think large parts of it have to be exposed. It might be possible that you have an instance that has GPUs available but don't want to run GPU-enabled Contrast on it, for example.

Also, reference value generation needs almost all values, if I'm not mistaken?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You picked the one dimension that I did not list as candidate for automation ;)

It's still an interesting question whether / how we want to deal with mixed images. As a user, how do I tell Contrast to use a GPU image for pod A, but a plain image for pod B? I imagine a config file is not granular enough here. If there is only one image per runtime, how do I mix two runtimes with one coordinator? (I think that's a scenario we might want to support eventually, though)

The reference values need the image measurements, the genpolicy choice and the hardware reference values, where the last two are only a question of AKS-CLH vs. rest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reference values need the image measurements, the genpolicy choice and the hardware reference values, where the last two are only a question of AKS-CLH vs. rest.

For hardware reference values, I think it's also a matter of TDX vs. SNP (Genoa) vs. SNP (Milan) and so on, right?

About mixed images, how the customer decides to do that is TBD still, I think. I don't have the solution at hand for this.

About "one image per runtime": I think we could ship all variants in the node-installer if we can make them reasonably small?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't fill hardware values for bare metal, so it's just image measurements I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no changelog PRs not listed in the release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants